Search Results: "rcp"

11 June 2023

Michael Prokop: What to expect from Debian/bookworm #newinbookworm

Bookworm Banner, Copyright 2022 Juliette Taka Debian v12 with codename bookworm was released as new stable release on 10th of June 2023. Similar to what we had with #newinbullseye and previous releases, now it s time for #newinbookworm! I was the driving force at several of my customers to be well prepared for bookworm. As usual with major upgrades, there are some things to be aware of, and hereby I m starting my public notes on bookworm that might be worth also for other folks. My focus is primarily on server systems and looking at things from a sysadmin perspective. Further readings As usual start at the official Debian release notes, make sure to especially go through What s new in Debian 12 + Issues to be aware of for bookworm. Package versions As a starting point, let s look at some selected packages and their versions in bullseye vs. bookworm as of 2023-02-10 (mainly having amd64 in mind):
Package bullseye/v11 bookworm/v12
ansible 2.10.7 2.14.3
apache 2.4.56 2.4.57
apt 2.2.4 2.6.1
bash 5.1 5.2.15
ceph 14.2.21 16.2.11
docker 20.10.5 20.10.24
dovecot 2.3.13 2.3.19
dpkg 1.20.12 1.21.22
emacs 27.1 28.2
gcc 10.2.1 12.2.0
git 2.30.2 2.39.2
golang 1.15 1.19
libc 2.31 2.36
linux kernel 5.10 6.1
llvm 11.0 14.0
lxc 4.0.6 5.0.2
mariadb 10.5 10.11
nginx 1.18.0 1.22.1
nodejs 12.22 18.13
openjdk 11.0.18 + 17.0.6 17.0.6
openssh 8.4p1 9.2p1
openssl 1.1.1n 3.0.8-1
perl 5.32.1 5.36.0
php 7.4+76 8.2+93
podman 3.0.1 4.3.1
postfix 3.5.18 3.7.5
postgres 13 15
puppet 5.5.22 7.23.0
python2 2.7.18 (gone!)
python3 3.9.2 3.11.2
qemu/kvm 5.2 7.2
ruby 2.7+2 3.1
rust 1.48.0 1.63.0
samba 4.13.13 4.17.8
systemd 247.3 252.6
unattended-upgrades 2.8 2.9.1
util-linux 2.36.1 2.38.1
vagrant 2.2.14 2.3.4
vim 8.2.2434 9.0.1378
zsh 5.8 5.9
Linux Kernel The bookworm release ships a Linux kernel based on version 6.1, whereas bullseye shipped kernel 5.10. As usual there are plenty of changes in the kernel area, including better hardware support, and this might warrant a separate blog entry, but to highlight some changes: See Kernelnewbies.org for further changes between kernel versions. Configuration management puppet s upstream sadly still doesn t provide packages for bookworm (see PA-4995), though Debian provides puppet-agent and puppetserver packages, and even puppetdb is back again, see release notes for further information. ansible is also available and made it with version 2.14 into bookworm. Prometheus stack Prometheus server was updated from v2.24.1 to v2.42.0 and all the exporters that got shipped with bullseye are still around (in more recent versions of course). Virtualization docker (v20.10.24), ganeti (v3.0.2-3), libvirt (v9.0.0-4), lxc (v5.0.2-1), podman (v4.3.1), openstack (Zed), qemu/kvm (v7.2), xen (v4.17.1) are all still around. Vagrant is available in version 2.3.4, also Vagrant upstream provides their packages for bookworm already. If you re relying on VirtualBox, be aware that upstream doesn t provide packages for bookworm yet (see ticket 21524), but thankfully version 7.0.8-dfsg-2 is available from Debian/unstable (as of 2023-06-10) (VirtualBox isn t shipped with stable releases since quite some time due to lack of cooperation from upstream on security support for older releases, see #794466). rsync rsync was updated from v3.2.3 to v3.2.7, and we got a few new options: OpenSSH OpenSSH was updated from v8.4p1 to v9.2p1, so if you re interested in all the changes, check out the release notes between those version (8.5, 8.6, 8.7, 8.8, 8.9, 9.0, 9.1 + 9.2). Let s highlight some notable new features: One important change you might wanna be aware of is that as of OpenSSH v8.8, RSA signatures using the SHA-1 hash algorithm got disabled by default, but RSA/SHA-256/512 AKA RSA-SHA2 gets used instead. OpenSSH has supported RFC8332 RSA/SHA-256/512 signatures since release 7.2 and existing ssh-rsa keys will automatically use the stronger algorithm where possible. A good overview is also available at SSH: Signature Algorithm ssh-rsa Error. Now tools/libraries not supporting RSA-SHA2 fail to connect to OpenSSH as present in bookworm. For example python3-paramiko v2.7.2-1 as present in bullseye doesn t support RSA-SHA2. It tries to connect using the deprecated RSA-SHA-1, which is no longer offered by default with OpenSSH as present in bookworm, and then fails. Support for RSA/SHA-256/512 signatures in Paramiko was requested e.g. at #1734, and eventually got added to Paramiko and in the end the change made it into Paramiko versions >=2.9.0. Paramiko in bookworm works fine, and a backport by rebuilding the python3-paramiko package from bookworm for bullseye solves the problem (BTDT). Misc unsorted Thanks to everyone involved in the release, happy upgrading to bookworm, and let s continue with working towards Debian/trixie. :)

27 May 2023

Dirk Eddelbuettel: RcppArmadillo 0.12.4.0.0 on CRAN: New Upstream Minor

armadillo image Armadillo is a powerful and expressive C++ template library for linear algebra and scientific computing. It aims towards a good balance between speed and ease of use, has a syntax deliberately close to Matlab, and is useful for algorithm development directly in C++, or quick conversion of research code into production environments. RcppArmadillo integrates this library with the R environment and language and is widely used by (currently) 1074 other packages on CRAN, downloaded 29.3 million times (per the partial logs from the cloud mirrors of CRAN), and the CSDA paper (preprint / vignette) by Conrad and myself has been cited 535 times according to Google Scholar. This release brings a new upstream release 12.4.0 made by Conrad a day or so ago. I prepared the usual release candidate, tested on the over 1000 reverse depends (which sadly takes almost a day on old hardware), found no issues and sent it to CRAN. Where it got tested again and was once again auto-processed smoothly by CRAN within a few hours on a Friday night which is just marvelous. So this time I tweeted about it too. The releases actually has a relatively small set of changes as a second follow-up release in the 12.* series.

Changes in RcppArmadillo version 0.12.4.0.0 (2023-05-26)
  • Upgraded to Armadillo release 12.4.0 (Cortisol Profusion Redux)
    • Added norm2est() for finding fast estimates of matrix 2-norm (spectral norm)
    • Added vecnorm() for obtaining the vector norm of each row or column of a matrix

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. If you like my open-source work, you may consider sponsoring me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

15 May 2023

Dirk Eddelbuettel: RcppSimdJson 0.1.10 on CRAN: New Upstream

We are happy to share that the RcppSimdJson package has been updated to release 0.1.10. RcppSimdJson wraps the fantastic and genuinely impressive simdjson library by Daniel Lemire and collaborators. Via very clever algorithmic engineering to obtain largely branch-free code, coupled with modern C++ and newer compiler instructions, it results in parsing gigabytes of JSON parsed per second which is quite mindboggling. The best-case performance is faster than CPU speed as use of parallel SIMD instructions and careful branch avoidance can lead to less than one cpu cycle per byte parsed; see the video of the talk by Daniel Lemire at QCon. This release updates the underlying simdjson library to version 3.1.8 (also made today). Otherwise we only made a minor edit to the README and adjusted one tweek for code coverage. The (very short) NEWS entry for this release follows.

Changes in version 0.1.10 (2023-05-14)
  • simdjson was upgraded to version 3.1.8 (Dirk in #85).

Courtesy of my CRANberries, there is also a diffstat report for this release. For questions, suggestions, or issues please use the issue tracker at the GitHub repo. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

2 May 2023

Dirk Eddelbuettel: RQuantLib 0.4.18 on CRAN: Maintenance

A new release 0.4.18 of RQuantLib arrived at CRAN earlier today, and will be uploaded to Debian as well. QuantLib is a very comprehensice free/open-source library for quantitative finance; RQuantLib connects it to the R environment and language. This release of RQuantLib comes about six months after the previous maintenance release. It brings a few small updates triggered by small changes in the QuantLib releases 1.29 and 1.30. It also contains updates reflecting changes in the rgl package kindly contributed by Duncan Murdoch. Last but not least, Jeroen Ooms helped with two pull requests updating builds on, repspectively, macOS and Windows by updating the pre-made libraries of QuantLib.

Changes in RQuantLib version 0.4.18 (2023-05-01)
  • Use of several rgl functions was updated to a new naming scheme in the package (kindly contributed by Duncan Murdoch in #174)
  • A default argument is now given for option surface plots
  • Changed call from SwaptionVolCube1 to SabrSwaptionVolatilityCube (conditional on using QuantLib 1.30 or later)
  • Some other deprecation warnings were tweaked as in QL test file
  • Builds for macOS and Windows were updated with more library build (changes kindly contributed by Jeron Ooms in #176 and #175)
  • Some remaining throw calls were replace by Rcpp::stop

Courtesy of my CRANberries, there is also a diffstat report for the this release. As always, more detailed information is on the RQuantLib page. Questions, comments etc should go to the new rquantlib-devel mailing list. Issue tickets can be filed at the GitHub repo. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

6 April 2023

Reproducible Builds: Reproducible Builds in March 2023

Welcome to the March 2023 report from the Reproducible Builds project. In these reports we outline the most important things that we have been up to over the past month. As a quick recap, the motivation behind the reproducible builds effort is to ensure no malicious flaws have been introduced during compilation and distributing processes. It does this by ensuring identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. If you are interested in contributing to the project, please do visit our Contribute page on our website.

News There was progress towards making the Go programming language reproducible this month, with the overall goal remaining making the Go binaries distributed from Google and by Arch Linux (and others) to be bit-for-bit identical. These changes could become part of the upcoming version 1.21 release of Go. An issue in the Go issue tracker (#57120) is being used to follow and record progress on this.
Arnout Engelen updated our website to add and update reproducibility-related links for NixOS to reproducible.nixos.org. [ ]. In addition, Chris Lamb made some cosmetic changes to our presentations and resources page. [ ][ ]
Intel published a guide on how to reproducibly build their Trust Domain Extensions (TDX) firmware. TDX here refers to an Intel technology that combines their existing virtual machine and memory encryption technology with a new kind of virtual machine guest called a Trust Domain. This runs the CPU in a mode that protects the confidentiality of its memory contents and its state from any other software.
A reproducibility-related bug from early 2020 in the GNU GCC compiler as been fixed. The issues was that if GCC was invoked via the as frontend, the -ffile-prefix-map was being ignored. We were tracking this in Debian via the build_path_captured_in_assembly_objects issue. It has now been fixed and will be reflected in GCC version 13.
Holger Levsen will present at foss-north 2023 in April of this year in Gothenburg, Sweden on the topic of Reproducible Builds, the first ten years.
Anthony Andreoli, Anis Lounis, Mourad Debbabi and Aiman Hanna of the Security Research Centre at Concordia University, Montreal published a paper this month entitled On the prevalence of software supply chain attacks: Empirical study and investigative framework:
Software Supply Chain Attacks (SSCAs) typically compromise hosts through trusted but infected software. The intent of this paper is twofold: First, we present an empirical study of the most prominent software supply chain attacks and their characteristics. Second, we propose an investigative framework for identifying, expressing, and evaluating characteristic behaviours of newfound attacks for mitigation and future defense purposes. We hypothesize that these behaviours are statistically malicious, existed in the past, and thus could have been thwarted in modernity through their cementation x-years ago. [ ]

On our mailing list this month:
  • Mattia Rizzolo is asking everyone in the community to save the date for the 2023 s Reproducible Builds summit which will take place between October 31st and November 2nd at Dock Europe in Hamburg, Germany. Separate announcement(s) to follow. [ ]
  • ahojlm posted an message announcing a new project which is the first project offering bootstrappable and verifiable builds without any binary seeds. That is to say, a way of providing a verifiable path towards trusted software development platform without relying on pre-provided binary code in order to prevent against various forms of compiler backdoors. The project s homepage is hosted on Tor (mirror).

The minutes and logs from our March 2023 IRC meeting have been published. In case you missed this one, our next IRC meeting will take place on Tuesday 25th April at 15:00 UTC on #reproducible-builds on the OFTC network.
and as a Valentines Day present, Holger Levsen wrote on his blog on 14th February to express his thanks to OSUOSL for their continuous support of reproducible-builds.org. [ ]

Debian Vagrant Cascadian developed an easier setup for testing debian packages which uses sbuild s unshare mode along and reprotest, our tool for building the same source code twice in different environments and then checking the binaries produced by each build for any differences. [ ]
Over 30 reviews of Debian packages were added, 14 were updated and 7 were removed this month, all adding to our knowledge about identified issues. A number of issues were updated, including the Holger Levsen updating build_path_captured_in_assembly_objects to note that it has been fixed for GCC 13 [ ] and Vagrant Cascadian added new issues to mark packages where the build path is being captured via the Rust toolchain [ ] as well as new categorisation for where virtual packages have nondeterministic versioned dependencies [ ].

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including: In addition, Vagrant Cascadian filed a bug with a patch to ensure GNU Modula-2 supports the SOURCE_DATE_EPOCH environment variable.

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In March, the following changes were made by Holger Levsen:
  • Arch Linux-related changes:
    • Build Arch packages in /tmp/archlinux-ci/$SRCPACKAGE instead of /tmp/$SRCPACKAGE. [ ]
    • Start 2/3 of the builds on the o1 node, the rest on o2. [ ]
    • Add graphs for Arch Linux (and OpenWrt) builds. [ ]
    • Toggle Arch-related builders to debug why a specific node overloaded. [ ][ ][ ][ ]
  • Node health checks:
    • Detect SetuptoolsDeprecationWarning tracebacks in Python builds. [ ]
    • Detect failures do perform chdist calls. [ ][ ]
  • OSUOSL node migration.
    • Install megacli packages that are needed for hardware RAID. [ ][ ]
    • Add health check and maintenance jobs for new nodes. [ ]
    • Add mail config for new nodes. [ ][ ]
    • Handle a node running in the future correctly. [ ][ ]
    • Migrate some nodes to Debian bookworm. [ ]
    • Fix nodes health overview for osuosl3. [ ]
    • Make sure the /srv/workspace directory is owned by by the jenkins user. [ ]
    • Use .debian.net names everywhere, except when communicating with the outside world. [ ]
    • Grant fpierret access to a new node. [ ]
    • Update documentation. [ ]
    • Misc migration changes. [ ][ ][ ][ ][ ][ ][ ][ ]
  • Misc changes:
    • Enable fail2ban everywhere and monitor it with munin [ ].
    • Gracefully deal with non-existing Alpine schroots. [ ]
In addition, Roland Clobus is continuing his work on reproducible Debian ISO images:
  • Add/update openQA configuration [ ], and use the actual timestamp for openQA builds [ ].
  • Moved adding the user to the docker group from the janitor_setup_worker script to the (more general) update_jdn.sh script. [ ]
  • Use the (short-term) reproducible source when generating live-build images. [ ]

diffoscope development diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats as well. This month, Mattia Rizzolo released versions 238, and Chris Lamb released versions 239 and 240. Chris Lamb also made the following changes:
  • Fix compatibility with PyPDF 3.x, and correctly restore test data. [ ]
  • Rework PDF annotation handling into a separate method. [ ]
In addition, Holger Levsen performed a long-overdue overhaul of the Lintian overrides in the Debian packaging [ ][ ][ ][ ], and Mattia Rizzolo updated the packaging to silence an include_package_data=True [ ], fixed the build under Debian bullseye [ ], fixed tool name in a list of tools permitted to be absent during package build tests [ ] and as well as documented sending out an email upon [ ]. In addition, Vagrant Cascadian updated the version of GNU Guix to 238 [ and 239 [ ]. Vagrant also updated reprotest to version 0.7.23. [ ]

Other development work Bernhard M. Wiedemann published another monthly report about reproducibility within openSUSE


If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

Dirk Eddelbuettel: RcppArmadillo 0.12.2.0.0 on CRAN: New Upstream Minor

armadillo image Armadillo is a powerful and expressive C++ template library for linear algebra and scientific computing. It aims towards a good balance between speed and ease of use, has a syntax deliberately close to Matlab, and is useful for algorithm development directly in C++, or quick conversion of research code into production environments. RcppArmadillo integrates this library with the R environment and language and is widely used by (currently) 1052 other packages on CRAN, downloaded 28.6 million times (per the partial logs from the cloud mirrors of CRAN), and the CSDA paper (preprint / vignette) by Conrad and myself has been cited 522 times according to Google Scholar. This release brings a new upstream release 12.2.0 made by Conrad a day or so ago. We prepared the usual release candidate, tested on the over 1000 reverse depends, found no issues and sent it to CRAN. Where it got tested again and was auto-processed smoothly by CRAN. The releases actually has a relatively small set of changes as a first follow-up release in the 12.2.* series.

Changes in RcppArmadillo version 0.12.2.0.0 (2023-04-04)
  • Upgraded to Armadillo release 12.2.0 (Cortisol Profusion Deluxe)
    • more efficient use of FFTW3 by fft() and ifft()
    • faster in-place element-wise multiplication of sparse matrices by dense matrices
    • added spsolve_factoriser class to allow reuse of sparse matrix factorisation for solving systems of linear equations

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

23 March 2023

Dirk Eddelbuettel: RcppSMC 0.2.7 on CRAN: Extensions and Update

A new release 0.2.7 of our RcppSMC package arrived at CRAN earlier today. It contains several extensions added by team member (and former GSoC student) Ilya Zarubin since the last release. We were a little slow to release those but one of those CRAN emails forced our hand for a release now. The updated uninitialized variable messages in clang++-16 have found a fan in Brian Ripley, and so he sent us a note. And as the issue was trivially reproducible with clang++-15 here too I had it fixed in no time. And both changes taken together form the incremental 0.2.7 release. RcppSMC provides Rcpp-based bindings to R for the Sequential Monte Carlo Template Classes (SMCTC) by Adam Johansen described in his JSS article. Sequential Monte Carlo is also referred to as Particle Filter in some contexts. The package now also features the Google Summer of Code work by Leah South in 2017, and by Ilya Zarubin in 2021. The release is summarized below.

Changes in RcppSMC version 0.2.7 (2023-03-22)
  • Extensive extensions for conditional SMC and resample, updated hello_world example, added skeleton function for easier package creation (Ilya in #67,#72)
  • Small package updates (Dirk in #75 fixing #74)

Courtesy of my CRANberries, there is a diffstat report for this release. More information is on the RcppSMC page. Issues and bugreports should go to the GitHub issue tracker. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

9 March 2023

Dirk Eddelbuettel: RcppRedis 0.2.3 on CRAN: Maintenance

A new minor release 0.2.3 of our RcppRedis package arrived on CRAN today. RcppRedis is one of several packages connecting R to the fabulous Redis in-memory datastructure store (and much more). RcppRedis does not pretend to be feature complete, but it may do some things faster than the other interfaces, and also offers an optional coupling with MessagePack binary (de)serialization via RcppMsgPack. The package has carried production loads on a trading floor for several years. This update is fairly mechanical. CRAN wants everybody off the C++11 train which is fair game given that it 2023 and most sane and lucky people are facing sane and modern compilers so this makes sense. (And I raise a toast to all those poor souls facing RHEL 7 / CentOS 7 with a compiler from many moons ago: I hear it is a vibrant job market out there so maybe time to make a switch ). As with a few of my other packages, this release simply does away with the imposition of C++11 as the package will compile just fine under C++14 or C++17 (as governed by your version of R). The detailed changes list follows.

Changes in version 0.2.3 (2023-03-08)
  • No longer set a C++ compilation standard as the default choices by R are sufficient for the package
  • Switch include to Rcpp/Rcpp which signals use of all Rcpp features including Modules

Courtesy of my CRANberries, there is also a diffstat report for this release. More information is on the RcppRedis page. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

7 March 2023

Dirk Eddelbuettel: RcppFastAD 0.0.1 and 0.0.2: New Package on CRAN!

James Yang and I are thrilled to announce the new CRAN package RcppFastAD which arrived at CRAN last Monday as version 0.0.1, and is as of today at version 0.0.2 with a first set of small updates. It is based on the FastAD header-only C++ library by James which provides a C++ implementation of both forward and reverse mode of automatic differentiation in an easy-to-use header library (which we wrapped here) that is both lightweight and performant. With a little of bit of Rcpp glue, it is also easy to use from R in simple C++ applications. Included in the package are three example: a simple quadratic expression evaluating x' S x for given x and S return the expression value with a gradient, a linear regression example generalising this and using the gradient to derive to arrive at the least-squares minimizing solution, as well as the well-known Black-Scholes options pricer and its important partial derivatives delta, rho, theta and vega derived via automatic differentiation. The NEWS file for these two initial releases follows.

Changes in version 0.0.2 (2023-03-05)
  • One C++ operation is protected from operating on a nullptr
  • Additional tests have been added, tests now cover all three demo / example functions
  • Return values and code for the examples linear_regression and quadratic_expression have been adjusted

Changes in version 0.0.1 (2023-02-24)
  • Initial release version and CRAN upload

Courtesy of my CRANberries, there is also a diffstat report for the most recent release. More information is available at the repository or the package page. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

6 March 2023

Vincent Bernat: DDoS detection and remediation with Akvorado and Flowspec

Akvorado collects sFlow and IPFIX flows, stores them in a ClickHouse database, and presents them in a web console. Although it lacks built-in DDoS detection, it s possible to create one by crafting custom ClickHouse queries.

DDoS detection Let s assume we want to detect DDoS targeting our customers. As an example, we consider a DDoS attack as a collection of flows over one minute targeting a single customer IP address, from a single source port and matching one of these conditions:
  • an average bandwidth of 1 Gbps,
  • an average bandwidth of 200 Mbps when the protocol is UDP,
  • more than 20 source IP addresses and an average bandwidth of 100 Mbps, or
  • more than 10 source countries and an average bandwidth of 100 Mbps.
Here is the SQL query to detect such attacks over the last 5 minutes:
SELECT *
FROM (
  SELECT
    toStartOfMinute(TimeReceived) AS TimeReceived,
    DstAddr,
    SrcPort,
    dictGetOrDefault('protocols', 'name', Proto, '???') AS Proto,
    SUM(((((Bytes * SamplingRate) * 8) / 1000) / 1000) / 1000) / 60 AS Gbps,
    uniq(SrcAddr) AS sources,
    uniq(SrcCountry) AS countries
  FROM flows
  WHERE TimeReceived > now() - INTERVAL 5 MINUTE
    AND DstNetRole = 'customers'
  GROUP BY
    TimeReceived,
    DstAddr,
    SrcPort,
    Proto
)
WHERE (Gbps > 1)
   OR ((Proto = 'UDP') AND (Gbps > 0.2)) 
   OR ((sources > 20) AND (Gbps > 0.1)) 
   OR ((countries > 10) AND (Gbps > 0.1))
ORDER BY
  TimeReceived DESC,
  Gbps DESC
Here is an example output1 where two of our users are under attack. One from what looks like an NTP amplification attack, the other from a DNS amplification attack:
TimeReceived DstAddr SrcPort Proto Gbps sources countries
2023-02-26 17:44:00 ::ffff:203.0.113.206 123 UDP 0.102 109 13
2023-02-26 17:43:00 ::ffff:203.0.113.206 123 UDP 0.130 133 17
2023-02-26 17:43:00 ::ffff:203.0.113.68 53 UDP 0.129 364 63
2023-02-26 17:43:00 ::ffff:203.0.113.206 123 UDP 0.113 129 21
2023-02-26 17:42:00 ::ffff:203.0.113.206 123 UDP 0.139 50 14
2023-02-26 17:42:00 ::ffff:203.0.113.206 123 UDP 0.105 42 14
2023-02-26 17:40:00 ::ffff:203.0.113.68 53 UDP 0.121 340 65

DDoS remediation Once detected, there are at least two ways to stop the attack at the network level:
  • blackhole the traffic to the targeted user (RTBH), or
  • selectively drop packets matching the attack patterns (Flowspec).

Traffic blackhole The easiest method is to sacrifice the attacked user. While this helps the attacker, this protects your network. It is a method supported by all routers. You can also offload this protection to many transit providers. This is useful if the attack volume exceeds your internet capacity. This works by advertising with BGP a route to the attacked user with a specific community. The border router modifies the next hop address of these routes to a specific IP address configured to forward the traffic to a null interface. RFC 7999 defines 65535:666 for this purpose. This is known as a remote-triggered blackhole (RTBH) and is explained in more detail in RFC 3882. It is also possible to blackhole the source of the attacks by leveraging unicast Reverse Path Forwarding (uRPF) from RFC 3704, as explained in RFC 5635. However, uRPF can be a serious tax on your router resources. See NCS5500 uRPF: Configuration and Impact on Scale for an example of the kind of restrictions you have to expect when enabling uRPF. On the advertising side, we can use BIRD. Here is a complete configuration file to allow any router to collect them:
log stderr all;
router id 192.0.2.1;
protocol device  
  scan time 10;
 
protocol bgp exporter  
  ipv4  
    import none;
    export where proto = "blackhole4";
   ;
  ipv6  
    import none;
    export where proto = "blackhole6";
   ;
  local as 64666;
  neighbor range 192.0.2.0/24 external;
  multihop;
  dynamic name "exporter";
  dynamic name digits 2;
  graceful restart yes;
  graceful restart time 0;
  long lived graceful restart yes;
  long lived stale time 3600;  # keep routes for 1 hour!
 
protocol static blackhole4  
  ipv4;
  route 203.0.113.206/32 blackhole  
    bgp_community.add((65535, 666));
   ;
  route 203.0.113.68/32 blackhole  
    bgp_community.add((65535, 666));
   ;
 
protocol static blackhole6  
  ipv6;
 
We use BGP long-lived graceful restart to ensure routes are kept for one hour, even if the BGP connection goes down, notably during maintenance. On the receiver side, if you have a Cisco router running IOS XR, you can use the following configuration to blackhole traffic received on the BGP session. As the BGP session is dedicated to this usage, The community is not used, but you can also forward these routes to your transit providers.
router static
 vrf public
  address-family ipv4 unicast
   192.0.2.1/32 Null0 description "BGP blackhole"
  !
  address-family ipv6 unicast
   2001:db8::1/128 Null0 description "BGP blackhole"
  !
 !
!
route-policy blackhole_ipv4_in_public
  if destination in (0.0.0.0/0 le 31) then
    drop
  endif
  set next-hop 192.0.2.1
  done
end-policy
!
route-policy blackhole_ipv6_in_public
  if destination in (::/0 le 127) then
    drop
  endif
  set next-hop 2001:db8::1
  done
end-policy
!
router bgp 12322
 neighbor-group BLACKHOLE_IPV4_PUBLIC
  remote-as 64666
  ebgp-multihop 255
  update-source Loopback10
  address-family ipv4 unicast
   maximum-prefix 100 90
   route-policy blackhole_ipv4_in_public in
   route-policy drop out
   long-lived-graceful-restart stale-time send 86400 accept 86400
  !
  address-family ipv6 unicast
   maximum-prefix 100 90
   route-policy blackhole_ipv6_in_public in
   route-policy drop out
   long-lived-graceful-restart stale-time send 86400 accept 86400
  !
 !
 vrf public
  neighbor 192.0.2.1
   use neighbor-group BLACKHOLE_IPV4_PUBLIC
   description akvorado-1
When the traffic is blackholed, it is still reported by IPFIX and sFlow. In Akvorado, use ForwardingStatus >= 128 as a filter. While this method is compatible with all routers, it makes the attack successful as the target is completely unreachable. If your router supports it, Flowspec can selectively filter flows to stop the attack without impacting the customer.

Flowspec Flowspec is defined in RFC 8955 and enables the transmission of flow specifications in BGP sessions. A flow specification is a set of matching criteria to apply to IP traffic. These criteria include the source and destination prefix, the IP protocol, the source and destination port, and the packet length. Each flow specification is associated with an action, encoded as an extended community: traffic shaping, traffic marking, or redirection. To announce flow specifications with BIRD, we extend our configuration. The extended community used shapes the matching traffic to 0 bytes per second.
flow4 table flowtab4;
flow6 table flowtab6;
protocol bgp exporter  
  flow4  
    import none;
    export where proto = "flowspec4";
   ;
  flow6  
    import none;
    export where proto = "flowspec6";
   ;
  # [ ]
 
protocol static flowspec4  
  flow4;
  route flow4  
    dst 203.0.113.68/32;
    sport = 53;
    length >= 1476 && <= 1500;
    proto = 17;
   
    bgp_ext_community.add((generic, 0x80060000, 0x00000000));
   ;
  route flow4  
    dst 203.0.113.206/32;
    sport = 123;
    length = 468;
    proto = 17;
   
    bgp_ext_community.add((generic, 0x80060000, 0x00000000));
   ;
 
protocol static flowspec6  
  flow6;
 
If you have a Cisco router running IOS XR, the configuration may look like this:
vrf public
 address-family ipv4 flowspec
 address-family ipv6 flowspec
!
router bgp 12322
 address-family vpnv4 flowspec
 address-family vpnv6 flowspec
 neighbor-group FLOWSPEC_IPV4_PUBLIC
  remote-as 64666
  ebgp-multihop 255
  update-source Loopback10
  address-family ipv4 flowspec
   long-lived-graceful-restart stale-time send 86400 accept 86400
   route-policy accept in
   route-policy drop out
   maximum-prefix 100 90
   validation disable
  !
  address-family ipv6 flowspec
   long-lived-graceful-restart stale-time send 86400 accept 86400
   route-policy accept in
   route-policy drop out
   maximum-prefix 100 90
   validation disable
  !
 !
 vrf public
  address-family ipv4 flowspec
  address-family ipv6 flowspec
  neighbor 192.0.2.1
   use neighbor-group FLOWSPEC_IPV4_PUBLIC
   description akvorado-1
Then, you need to enable Flowspec on all interfaces with:
flowspec
 vrf public
  address-family ipv4
   local-install interface-all
  !
  address-family ipv6
   local-install interface-all
  !
 !
!
As with the RTBH setup, you can filter dropped flows with ForwardingStatus >= 128.

DDoS detection (continued) In the example using Flowspec, the flows were also filtered on the length of the packet:
route flow4  
  dst 203.0.113.68/32;
  sport = 53;
  length >= 1476 && <= 1500;
  proto = 17;
 
  bgp_ext_community.add((generic, 0x80060000, 0x00000000));
 ;
This is an important addition: legitimate DNS requests are smaller than this and therefore not filtered.2 With ClickHouse, you can get the 10th and 90th percentiles of the packet sizes with quantiles(0.1, 0.9)(Bytes/Packets). The last issue we need to tackle is how to optimize the request: it may need several seconds to collect the data and it is likely to consume substantial resources from your ClickHouse database. One solution is to create a materialized view to pre-aggregate results:
CREATE TABLE ddos_logs (
  TimeReceived DateTime,
  DstAddr IPv6,
  Proto UInt32,
  SrcPort UInt16,
  Gbps SimpleAggregateFunction(sum, Float64),
  Mpps SimpleAggregateFunction(sum, Float64),
  sources AggregateFunction(uniqCombined(12), IPv6),
  countries AggregateFunction(uniqCombined(12), FixedString(2)),
  size AggregateFunction(quantiles(0.1, 0.9), UInt64)
) ENGINE = SummingMergeTree
PARTITION BY toStartOfHour(TimeReceived)
ORDER BY (TimeReceived, DstAddr, Proto, SrcPort)
TTL toStartOfHour(TimeReceived) + INTERVAL 6 HOUR DELETE ;
CREATE MATERIALIZED VIEW ddos_logs_view TO ddos_logs AS
  SELECT
    toStartOfMinute(TimeReceived) AS TimeReceived,
    DstAddr,
    Proto,
    SrcPort,
    sum(((((Bytes * SamplingRate) * 8) / 1000) / 1000) / 1000) / 60 AS Gbps,
    sum(((Packets * SamplingRate) / 1000) / 1000) / 60 AS Mpps,
    uniqCombinedState(12)(SrcAddr) AS sources,
    uniqCombinedState(12)(SrcCountry) AS countries,
    quantilesState(0.1, 0.9)(toUInt64(Bytes/Packets)) AS size
  FROM flows
  WHERE DstNetRole = 'customers'
  GROUP BY
    TimeReceived,
    DstAddr,
    Proto,
    SrcPort
The ddos_logs table is using the SummingMergeTree engine. When the table receives new data, ClickHouse replaces all the rows with the same sorting key, as defined by the ORDER BY directive, with one row which contains summarized values using either the sum() function or the explicitly specified aggregate function (uniqCombined and quantiles in our example).3 Finally, we can modify our initial query with the following one:
SELECT *
FROM (
  SELECT
    TimeReceived,
    DstAddr,
    dictGetOrDefault('protocols', 'name', Proto, '???') AS Proto,
    SrcPort,
    sum(Gbps) AS Gbps,
    sum(Mpps) AS Mpps,
    uniqCombinedMerge(12)(sources) AS sources,
    uniqCombinedMerge(12)(countries) AS countries,
    quantilesMerge(0.1, 0.9)(size) AS size
  FROM ddos_logs
  WHERE TimeReceived > now() - INTERVAL 60 MINUTE
  GROUP BY
    TimeReceived,
    DstAddr,
    Proto,
    SrcPort
)
WHERE (Gbps > 1)
   OR ((Proto = 'UDP') AND (Gbps > 0.2)) 
   OR ((sources > 20) AND (Gbps > 0.1)) 
   OR ((countries > 10) AND (Gbps > 0.1))
ORDER BY
  TimeReceived DESC,
  Gbps DESC

Gluing everything together To sum up, building an anti-DDoS system requires to following these steps:
  1. define a set of criteria to detect a DDoS attack,
  2. translate these criteria into SQL requests,
  3. pre-aggregate flows into SummingMergeTree tables,
  4. query and transform the results to a BIRD configuration file, and
  5. configure your routers to pull the routes from BIRD.
A Python script like the following one can handle the fourth step. For each attacked target, it generates both a Flowspec rule and a blackhole route.
import socket
import types
from clickhouse_driver import Client as CHClient
# Put your SQL query here!
SQL_QUERY = " "
# How many anti-DDoS rules we want at the same time?
MAX_DDOS_RULES = 20
def empty_ruleset():
    ruleset = types.SimpleNamespace()
    ruleset.flowspec = types.SimpleNamespace()
    ruleset.blackhole = types.SimpleNamespace()
    ruleset.flowspec.v4 = []
    ruleset.flowspec.v6 = []
    ruleset.blackhole.v4 = []
    ruleset.blackhole.v6 = []
    return ruleset
current_ruleset = empty_ruleset()
client = CHClient(host="clickhouse.akvorado.net")
while True:
    results = client.execute(SQL_QUERY)
    seen =  
    new_ruleset = empty_ruleset()
    for (t, addr, proto, port, gbps, mpps, sources, countries, size) in results:
        if (addr, proto, port) in seen:
            continue
        seen[(addr, proto, port)] = True
        # Flowspec
        if addr.ipv4_mapped:
            address = addr.ipv4_mapped
            rules = new_ruleset.flowspec.v4
            table = "flow4"
            mask = 32
            nh = "proto"
        else:
            address = addr
            rules = new_ruleset.flowspec.v6
            table = "flow6"
            mask = 128
            nh = "next header"
        if size[0] == size[1]:
            length = f"length =  int(size[0]) "
        else:
            length = f"length >=  int(size[0])  && <=  int(size[1]) "
        header = f"""
# Time:  t 
# Source:  address , protocol:  proto , port:  port 
# Gbps/Mpps:  gbps:.3 / mpps:.3 , packet size:  int(size[0]) <=X<= int(size[1]) 
# Flows:  flows , sources:  sources , countries:  countries 
"""
        rules.append(
                f""" header 
route  table   
  dst  address / mask ;
  sport =  port ;
   length ;
   nh  =  socket.getprotobyname(proto) ;
 
  bgp_ext_community.add((generic, 0x80060000, 0x00000000));
 ;
"""
        )
        # Blackhole
        if addr.ipv4_mapped:
            rules = new_ruleset.blackhole.v4
        else:
            rules = new_ruleset.blackhole.v6
        rules.append(
            f""" header 
route  address / mask  blackhole  
  bgp_community.add((65535, 666));
 ;
"""
        )
        new_ruleset.flowspec.v4 = list(
            set(new_ruleset.flowspec.v4[:MAX_DDOS_RULES])
        )
        new_ruleset.flowspec.v6 = list(
            set(new_ruleset.flowspec.v6[:MAX_DDOS_RULES])
        )
        # TODO: advertise changes by mail, chat, ...
        current_ruleset = new_ruleset
        changes = False
        for rules, path in (
            (current_ruleset.flowspec.v4, "v4-flowspec"),
            (current_ruleset.flowspec.v6, "v6-flowspec"),
            (current_ruleset.blackhole.v4, "v4-blackhole"),
            (current_ruleset.blackhole.v6, "v6-blackhole"),
        ):
            path = os.path.join("/etc/bird/", f" path .conf")
            with open(f" path .tmp", "w") as f:
                for r in rules:
                    f.write(r)
            changes = (
                changes or not os.path.exists(path) or not samefile(path, f" path .tmp")
            )
            os.rename(f" path .tmp", path)
        if not changes:
            continue
        proc = subprocess.Popen(
            ["birdc", "configure"],
            stdin=subprocess.DEVNULL,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
        )
        stdout, stderr = proc.communicate(None)
        stdout = stdout.decode("utf-8", "replace")
        stderr = stderr.decode("utf-8", "replace")
        if proc.returncode != 0:
            logger.error(
                "  error:\n \n ".format(
                    "birdc reconfigure",
                    "\n".join(
                        [" O:  ".format(line) for line in stdout.rstrip().split("\n")]
                    ),
                    "\n".join(
                        [" E:  ".format(line) for line in stderr.rstrip().split("\n")]
                    ),
                )
            )

Until Akvorado integrates DDoS detection and mitigation, the ideas presented in this blog post provide a solid foundation to get started with your own anti-DDoS system.

  1. ClickHouse can export results using Markdown format when appending FORMAT Markdown to the query.
  2. While most DNS clients should retry with TCP on failures, this is not always the case: until recently, musl libc did not implement this.
  3. The materialized view also aggregates the data at hand, both for efficiency and to ensure we work with the right data types.

22 February 2023

Dirk Eddelbuettel: RcppArmadillo 0.12.0.1.0 on CRAN: New Upstream, New Features

armadillo image Armadillo is a powerful and expressive C++ template library for linear algebra and scientific computing. It aims towards a good balance between speed and ease of use, has a syntax deliberately close to Matlab, and is useful for algorithm development directly in C++, or quick conversion of research code into production environments. RcppArmadillo integrates this library with the R environment and language and is widely used by (currently) 1042 other packages on CRAN, downloaded 28.1 million times (per the partial logs from the cloud mirrors of CRAN), and the CSDA paper (preprint / vignette) by Conrad and myself has been cited 513 times according to Google Scholar. This release brings a new upstream release 12.0.1. We found a small regression with the 12.0.0 release when we tested prior to a CRAN upload. Conrad very promptly fixed this with a literal one liner and made it 12.0.1 which we wrapped up as 0.12.0.1.0. Subsequent testing revealed no issues for us, and CRAN autoprocessed it as I tweeted earlier. This is actually quite impressive given the over 1000 CRAN packages using it all of which got tested again by CRAN. All this is testament to the rigour, as well as the well-oiled process at the repository. Our thanks go to the tireless maintainers! The releases actually has a rather nice set of changes (detailed below) to which we added one robustification thanks to Kevin. The full set of changes follows. We include the previous changeset as we may have skipped the usual blog post here.

Changes in RcppArmadillo version 0.12.0.1.0 (2023-02-20)
  • Upgraded to Armadillo release 12.0.1 (Cortisol Profusion)
    • faster fft() and ifft() via optional use of FFTW3
    • faster min() and max()
    • faster index_min() and index_max()
    • added .col_as_mat() and .row_as_mat() which return matrix representation of cube column and cube row
    • added csv_opts::strict option to loading CSV files to interpret missing values as NaN
    • added check_for_zeros option to form 4 of sparse matrix batch constructors
    • inv() and inv_sympd() with options inv_opts::no_ugly or inv_opts::allow_approx now use a scaled threshold similar to pinv()
    • set_cout_stream() and set_cerr_stream() are now no-ops; instead use the options ARMA_WARN_LEVEL, or ARMA_COUT_STREAM, or ARMA_CERR_STREAM
    • fix regression (mis-compilation) in shift() function (reported by us in #409)
  • The include directory order is now more robust (Kevin Ushey in #407 addressing #406)

Changes in RcppArmadillo version 0.11.4.4.0 (2023-02-09)
  • Upgraded to Armadillo release 11.4.4 (Ship of Theseus)
    • extended pow() with various forms of element-wise power operations
    • added find_nan() to find indices of NaN elements
    • faster handling of compound expressions by sum()
  • The package no longer sets a compilation standard, or progagates on in the generated packages as R ensures C++11 on all non-ancient versions
  • The CITATION file was updated to the current format

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

6 February 2023

Vincent Bernat: Fast and dynamic encoding of Protocol Buffers in Go

Protocol Buffers are a popular choice for serializing structured data due to their compact size, fast processing speed, language independence, and compatibility. There exist other alternatives, including Cap n Proto, CBOR, and Avro. Usually, data structures are described in a proto definition file (.proto). The protoc compiler and a language-specific plugin convert it into code:
$ head flow-4.proto
syntax = "proto3";
package decoder;
option go_package = "akvorado/inlet/flow/decoder";
message FlowMessagev4  
  uint64 TimeReceived = 2;
  uint32 SequenceNum = 3;
  uint64 SamplingRate = 4;
  uint32 FlowDirection = 5;
$ protoc -I=. --plugin=protoc-gen-go --go_out=module=akvorado:. flow-4.proto
$ head inlet/flow/decoder/flow-4.pb.go
// Code generated by protoc-gen-go. DO NOT EDIT.
// versions:
//      protoc-gen-go v1.28.0
//      protoc        v3.21.12
// source: inlet/flow/data/schemas/flow-4.proto
package decoder
import (
        protoreflect "google.golang.org/protobuf/reflect/protoreflect"
Akvorado collects network flows using IPFIX or sFlow, decodes them with GoFlow2, encodes them to Protocol Buffers, and sends them to Kafka to be stored in a ClickHouse database. Collecting a new field, such as source and destination MAC addresses, requires modifications in multiple places, including the proto definition file and the ClickHouse migration code. Moreover, the cost is paid by all users.1 It would be nice to have an application-wide schema and let users enable or disable the fields they need. While the main goal is flexibility, we do not want to sacrifice performance. On this front, this is quite a success: when upgrading from 1.6.4 to 1.7.1, the decoding and encoding performance almost doubled!
goos: linux
goarch: amd64
pkg: akvorado/inlet/flow
cpu: AMD Ryzen 5 5600X 6-Core Processor
                              initial.txt                 final.txt               
                                 sec/op        sec/op     vs base                 
Netflow/with_encoding-12      12.963    2%   7.836    1%  -39.55% (p=0.000 n=10)
Sflow/with_encoding-12         19.37    1%   10.15    2%  -47.63% (p=0.000 n=10)

Faster Protocol Buffers encoding I use the following code to benchmark both the decoding and encoding process. Initially, the Decode() method is a thin layer above GoFlow2 producer and stores the decoded data into the in-memory structure generated by protoc. Later, some of the data will be encoded directly during flow decoding. This is why we measure both the decoding and the encoding.2
func BenchmarkDecodeEncodeSflow(b *testing.B)  
    r := reporter.NewMock(b)
    sdecoder := sflow.New(r)
    data := helpers.ReadPcapPayload(b,
        filepath.Join("decoder", "sflow", "testdata", "data-1140.pcap"))
    for _, withEncoding := range []bool true, false   
        title := map[bool]string 
            true:  "with encoding",
            false: "without encoding",
         [withEncoding]
        var got []*decoder.FlowMessage
        b.Run(title, func(b *testing.B)  
            for i := 0; i < b.N; i++  
                got = sdecoder.Decode(decoder.RawFlow 
                    Payload: data,
                    Source: net.ParseIP("127.0.0.1"),
                 )
                if withEncoding  
                    for _, flow := range got  
                        buf := []byte 
                        buf = protowire.AppendVarint(buf, uint64(proto.Size(flow)))
                        proto.MarshalOptions .MarshalAppend(buf, flow)
                     
                 
             
         )
     
 
The canonical Go implementation for Protocol Buffers, google.golang.org/protobuf is not the most efficient one. For a long time, people were relying on gogoprotobuf. However, the project is now deprecated. A good replacement is vtprotobuf.3
goos: linux
goarch: amd64
pkg: akvorado/inlet/flow
cpu: AMD Ryzen 5 5600X 6-Core Processor
                              initial.txt               bench-2.txt              
                                sec/op        sec/op     vs base                 
Netflow/with_encoding-12      12.96    2%   10.28    2%  -20.67% (p=0.000 n=10)
Netflow/without_encoding-12   8.935    2%   8.975    2%        ~ (p=0.143 n=10)
Sflow/with_encoding-12        19.37    1%   16.67    2%  -13.93% (p=0.000 n=10)
Sflow/without_encoding-12     14.62    3%   14.87    1%   +1.66% (p=0.007 n=10)

Dynamic Protocol Buffers encoding We have our baseline. Let s see how to encode our Protocol Buffers without a .proto file. The wire format is simple and rely a lot on variable-width integers. Variable-width integers, or varints, are an efficient way of encoding unsigned integers using a variable number of bytes, from one to ten, with small values using fewer bytes. They work by splitting integers into 7-bit payloads and using the 8th bit as a continuation indicator, set to 1 for all payloads except the last.
Variable-width integers encoding in Protocol Buffers: conversion of 150 to a varint
Variable-width integers encoding in Protocol Buffers
For our usage, we only need two types: variable-width integers and byte sequences. A byte sequence is encoded by prefixing it by its length as a varint. When a message is encoded, each key-value pair is turned into a record consisting of a field number, a wire type, and a payload. The field number and the wire type are encoded as a single variable-width integer called a tag.
Message encoded with Protocol Buffers: three varints, two sequences of bytes
Message encoded with Protocol Buffers
We use the following low-level functions to build the output buffer: Our schema abstraction contains the appropriate information to encode a message (ProtobufIndex) and to generate a proto definition file (fields starting with Protobuf):
type Column struct  
    Key       ColumnKey
    Name      string
    Disabled  bool
    // [ ]
    // For protobuf.
    ProtobufIndex    protowire.Number
    ProtobufType     protoreflect.Kind // Uint64Kind, Uint32Kind,  
    ProtobufEnum     map[int]string
    ProtobufEnumName string
    ProtobufRepeated bool
 
We have a few helper methods around the protowire functions to directly encode the fields while decoding the flows. They skip disabled fields or non-repeated fields already encoded. Here is an excerpt of the sFlow decoder:
sch.ProtobufAppendVarint(bf, schema.ColumnBytes, uint64(recordData.Base.Length))
sch.ProtobufAppendVarint(bf, schema.ColumnProto, uint64(recordData.Base.Protocol))
sch.ProtobufAppendVarint(bf, schema.ColumnSrcPort, uint64(recordData.Base.SrcPort))
sch.ProtobufAppendVarint(bf, schema.ColumnDstPort, uint64(recordData.Base.DstPort))
sch.ProtobufAppendVarint(bf, schema.ColumnEType, helpers.ETypeIPv4)
For fields that are required later in the pipeline, like source and destination addresses, they are stored unencoded in a separate structure:
type FlowMessage struct  
    TimeReceived uint64
    SamplingRate uint32
    // For exporter classifier
    ExporterAddress netip.Addr
    // For interface classifier
    InIf  uint32
    OutIf uint32
    // For geolocation or BMP
    SrcAddr netip.Addr
    DstAddr netip.Addr
    NextHop netip.Addr
    // Core component may override them
    SrcAS     uint32
    DstAS     uint32
    GotASPath bool
    // protobuf is the protobuf representation for the information not contained above.
    protobuf      []byte
    protobufSet   bitset.BitSet
 
The protobuf slice holds encoded data. It is initialized with a capacity of 500 bytes to avoid resizing during encoding. There is also some reserved room at the beginning to be able to encode the total size as a variable-width integer. Upon finalizing encoding, the remaining fields are added and the message length is prefixed:
func (schema *Schema) ProtobufMarshal(bf *FlowMessage) []byte  
    schema.ProtobufAppendVarint(bf, ColumnTimeReceived, bf.TimeReceived)
    schema.ProtobufAppendVarint(bf, ColumnSamplingRate, uint64(bf.SamplingRate))
    schema.ProtobufAppendIP(bf, ColumnExporterAddress, bf.ExporterAddress)
    schema.ProtobufAppendVarint(bf, ColumnSrcAS, uint64(bf.SrcAS))
    schema.ProtobufAppendVarint(bf, ColumnDstAS, uint64(bf.DstAS))
    schema.ProtobufAppendIP(bf, ColumnSrcAddr, bf.SrcAddr)
    schema.ProtobufAppendIP(bf, ColumnDstAddr, bf.DstAddr)
    // Add length and move it as a prefix
    end := len(bf.protobuf)
    payloadLen := end - maxSizeVarint
    bf.protobuf = protowire.AppendVarint(bf.protobuf, uint64(payloadLen))
    sizeLen := len(bf.protobuf) - end
    result := bf.protobuf[maxSizeVarint-sizeLen : end]
    copy(result, bf.protobuf[end:end+sizeLen])
    return result
 
Minimizing allocations is critical for maintaining encoding performance. The benchmark tests should be run with the -benchmem flag to monitor allocation numbers. Each allocation incurs an additional cost to the garbage collector. The Go profiler is a valuable tool for identifying areas of code that can be optimized:
$ go test -run=__nothing__ -bench=Netflow/with_encoding \
>         -benchmem -cpuprofile profile.out \
>         akvorado/inlet/flow
goos: linux
goarch: amd64
pkg: akvorado/inlet/flow
cpu: AMD Ryzen 5 5600X 6-Core Processor
Netflow/with_encoding-12             143953              7955 ns/op            8256 B/op        134 allocs/op
PASS
ok      akvorado/inlet/flow     1.418s
$ go tool pprof profile.out
File: flow.test
Type: cpu
Time: Feb 4, 2023 at 8:12pm (CET)
Duration: 1.41s, Total samples = 2.08s (147.96%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) web
After using the internal schema instead of code generated from the proto definition file, the performance improved. However, this comparison is not entirely fair as less information is being decoded and previously GoFlow2 was decoding to its own structure, which was then copied to our own version.
goos: linux
goarch: amd64
pkg: akvorado/inlet/flow
cpu: AMD Ryzen 5 5600X 6-Core Processor
                              bench-2.txt                bench-3.txt              
                                 sec/op        sec/op     vs base                 
Netflow/with_encoding-12      10.284    2%   7.758    3%  -24.56% (p=0.000 n=10)
Netflow/without_encoding-12    8.975    2%   7.304    2%  -18.61% (p=0.000 n=10)
Sflow/with_encoding-12         16.67    2%   14.26    1%  -14.50% (p=0.000 n=10)
Sflow/without_encoding-12      14.87    1%   13.56    2%   -8.80% (p=0.000 n=10)
As for testing, we use github.com/jhump/protoreflect: the protoparse package parses the proto definition file we generate and the dynamic package decodes the messages. Check the ProtobufDecode() method for more details.4 To get the final figures, I have also optimized the decoding in GoFlow2. It was relying heavily on binary.Read(). This function may use reflection in certain cases and each call allocates a byte array to read data. Replacing it with a more efficient version provides the following improvement:
goos: linux
goarch: amd64
pkg: akvorado/inlet/flow
cpu: AMD Ryzen 5 5600X 6-Core Processor
                              bench-3.txt                bench-4.txt              
                                 sec/op        sec/op     vs base                 
Netflow/with_encoding-12       7.758    3%   7.365    2%   -5.07% (p=0.000 n=10)
Netflow/without_encoding-12    7.304    2%   6.931    3%   -5.11% (p=0.000 n=10)
Sflow/with_encoding-12        14.256    1%   9.834    2%  -31.02% (p=0.000 n=10)
Sflow/without_encoding-12     13.559    2%   9.353    2%  -31.02% (p=0.000 n=10)
It is now easier to collect new data and the inlet component is faster!

Notice Some paragraphs were editorialized by ChatGPT, using editorialize and keep it short as a prompt. The result was proofread by a human for correctness. The main idea is that ChatGPT should be better at English than me.


  1. While empty fields are not serialized to Protocol Buffers, empty columns in ClickHouse take some space, even if they compress well. Moreover, unused fields are still decoded and they may clutter the interface.
  2. There is a similar function using NetFlow. NetFlow and IPFIX protocols are less complex to decode than sFlow as they are using a simpler TLV structure.
  3. vtprotobuf generates more optimized Go code by removing an abstraction layer. It directly generates the code encoding each field to bytes:
    if m.OutIfSpeed != 0  
        i = encodeVarint(dAtA, i, uint64(m.OutIfSpeed))
        i--
        dAtA[i] = 0x6
        i--
        dAtA[i] = 0xd8
     
    
  4. There is also a protoprint package to generate proto definition file. I did not use it.

2 February 2023

Dirk Eddelbuettel: RInside 0.2.18 on CRAN: Maintenance

A new release 0.2.18 of RInside arrived on CRAN and in Debian today. This is the first release in ten months since the 0.2.17 release. RInside provides a set of convenience classes which facilitate embedding of R inside of C++ applications and programs, using the classes and functions provided by Rcpp. This release brings a contributed change to how the internal REPL is called: Dominick found the current form more reliable when embedding R on Windows. We also updated a few other things around the package. The list of changes since the last release:

Changes in RInside version 0.2.18 (2023-02-01)
  • The random number initialization was updated as in R.
  • The main REPL is now running via 'run_Rmainloop()'.
  • Small routine update to package and continuous integration.

My CRANberries also provides a short report with changes from the previous release. More information is on the RInside page. Questions, comments etc should go to the rcpp-devel mailing list off the Rcpp R-Forge page, or to issues tickets at the GitHub repo. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

29 January 2023

Dirk Eddelbuettel: RcppTOML 0.2.2 on CRAN: Now with macOS-on-Intel Builds

Just days after a build-fix release (for aarch64) and still only a few weeks after the 0.2.0 release of RcppTOML and its switch to toml++, we have another bugfix release 0.2.2 on CRAN also bringing release 3.3.0 of toml++ (even if we had large chunks of 3.3.0 already incorporated). TOML is a file format that is most suitable for configurations, as it is meant to be edited by humans but read by computers. It emphasizes strong readability for humans while at the same time supporting strong typing as well as immediate and clear error reports. On small typos you get parse errors, rather than silently corrupted garbage. Much preferable to any and all of XML, JSON or YAML though sadly these may be too ubiquitous now. TOML is frequently being used with the projects such as the Hugo static blog compiler, or the Cargo system of Crates (aka packages ) for the Rust language. The package was building fine on Intel-based macOS provided the versions were recent enough. CRAN, however, aims for the broadest possibly reach of binaries and builds on a fairly ancient macOS 10.13 with clang version 10. This confused toml++ into (wrongly) concluding it could not build when it in fact can. After a hint from Simon that Apple in their infinite wisdom redefines clang version ids, this has been reflected in version 3.3.0 of toml++ by Mark so we should now build everywhere. Big thanks to everybody for the help. The short summary of changes follows.

Changes in version 0.2.2 (2023-01-29)
  • New toml++ version 3.3.0 with fix to permit compilation on ancient macOS systems as used by CRAN for the Intel-based builds.

Courtesy of my CRANberries, there is a diffstat report for this release. More information is on the RcppTOML page page. Please use the GitHub issue tracker for issues and bugreports. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

26 January 2023

Dirk Eddelbuettel: RcppTOML 0.2.1 on CRAN: Small Build Fix for Some Arches

Two weeks after the release of RcppTOML 0.2.0 and the switch to toml++, we have a quick bugfix release 0.2.1. TOML is a file format that is most suitable for configurations, as it is meant to be edited by humans but read by computers. It emphasizes strong readability for humans while at the same time supporting strong typing as well as immediate and clear error reports. On small typos you get parse errors, rather than silently corrupted garbage. Much preferable to any and all of XML, JSON or YAML though sadly these may be too ubiquitous now. TOML is frequently being used with the projects such as the Hugo static blog compiler, or the Cargo system of Crates (aka packages ) for the Rust language. Some architectures, aarch64 included, got confused over float16 which is of course a tiny two-byte type nobody should need. After consulting with Mark we concluded to (at least for now) simply override this excluding the use of float16 . The short summary of changes follows.

Changes in version 0.2.1 (2023-01-25)
  • Explicitly set -DTOML_ENABLE_FLOAT16=0 to permit compilation on some architectures stumbling of the type.

Courtesy of my CRANberries, there is a diffstat report for this release. More information is on the RcppTOML page page. Please use the GitHub issue tracker for issues and bugreports. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

22 January 2023

Dirk Eddelbuettel: Rcpp 1.0.10 on CRAN: Regular Update

rcpp logo The Rcpp team is thrilled to announce the newest release 1.0.10 of the Rcpp package which is hitting CRAN now and will go to Debian shortly. Windows and macOS builds should appear at CRAN in the next few days, as will builds in different Linux distribution and of course at r2u. The release was prepared a few days ago, but given the widespread use at CRAN it took a few days to be processed. As always, our sincere thanks to the CRAN maintainers Uwe Ligges and Kurt Hornik. This release continues with the six-months cycle started with release 1.0.5 in July 2020. As a reminder, we do of course make interim snapshot dev or rc releases available via the Rcpp drat repo and strongly encourage their use and testing I run my systems with these versions which tend to work just as well, and are also fully tested against all reverse-dependencies. Rcpp has become the most popular way of enhancing R with C or C++ code. Right now, around 2623 packages on CRAN depend on Rcpp for making analytical code go faster and further, along with 252 in BioConductor. On CRAN, 13.7% of all packages depend (directly) on CRAN, and 58.7% of all compiled packages do. From the cloud mirror of CRAN (which is but a subset of all CRAN downloads), Rcpp has been downloaded 67.1 million times. This release is incremental as usual, preserving existing capabilities faithfully while smoothing our corners and / or extending slightly. Of particular note is the now fully-enabled use of the unwind protection making some operations a little faster by default; special thanks to I aki for spearheading this. Kevin and I also polished a few other bugs off as detailed below. The full list of details follows.

Changes in Rcpp release version 1.0.10 (2023-01-12)
  • Changes in Rcpp API:
    • Unwind protection is enabled by default (I aki in #1225). It can be disabled by defining RCPP_NO_UNWIND_PROTECT before including Rcpp.h. RCPP_USE_UNWIND_PROTECT is not checked anymore and has no effect. The associated plugin unwindProtect is therefore deprecated and will be removed in a future release.
    • The 'finalize' method for Rcpp Modules is now eagerly materialized, fixing an issue where errors can occur when Module finalizers are run (Kevin in #1231 closing #1230).
    • Zero-row data.frame objects can receive push_back or push_front (Dirk in #1233 fixing #1232).
    • One remaining sprintf has been replaced by snprintf (Dirk and Kevin in #1236 and #1237).
    • Several conversion warnings found by clang++ have been addressed (Dirk in #1240 and #1241).
  • Changes in Rcpp Attributes:
    • The C++20, C++2b (experimental) and C++23 standards now have plugin support like the other C++ standards (Dirk in #1228).
    • The source path for attributes received one more protection from spaces (Dirk in #1235 addressing #1234).
  • Changes in Rcpp Deployment:
    • Several GitHub Actions have been updated.

Thanks to my CRANberries, you can also look at a diff to the previous release. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. Bugs reports are welcome at the GitHub issue tracker as well (where one can also search among open or closed issues); questions are also welcome under rcpp tag at StackOverflow which also allows searching among the (currently) 2932 previous questions. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

21 January 2023

Dirk Eddelbuettel: RcppSimdJson 0.1.9 on CRAN: New Upstream

The RcppSimdJson package was just updated to release 0.1.9. RcppSimdJson wraps the fantastic and genuinely impressive simdjson library by Daniel Lemire and collaborators. Via very clever algorithmic engineering to obtain largely branch-free code, coupled with modern C++ and newer compiler instructions, it results in parsing gigabytes of JSON parsed per second which is quite mindboggling. The best-case performance is faster than CPU speed as use of parallel SIMD instructions and careful branch avoidance can lead to less than one cpu cycle per byte parsed; see the video of the talk by Daniel Lemire at QCon. This release updates the underlying simdjson library to version 3.0.1, settles on C++17 as the language standard, exports a worker function for direct C(++) access, and polishes a few small things around the package and tests. The NEWS entry for this release follows.

Changes in version 0.1.9 (2023-01-21)
  • The internal function deseralize_json is now exported at the C++ level as well as in R (Dirk in #81 closing #80).
  • simdjson was upgraded to version 3.0.1 (Dirk in #83).
  • The package now defaults to C++17 compilation; configure has been retired (Dirk closing #82).
  • The three main R access functions now use a more compact argument check via stopifnot (Dirk).

Courtesy of my CRANberries, there is also a diffstat report for this release. For questions, suggestions, or issues please use the issue tracker at the GitHub repo. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Dirk Eddelbuettel: RcppFastFloat 0.0.4 on CRAN: New Upstream

A new release of RcppFastFloat arrived on CRAN yesterday. The package wraps fast_float, another nice library by Daniel Lemire. For details, see the arXiv paper showing that one can convert character representations of numbers into floating point at rates at or exceeding one gigabyte per second. This release updates the underlying fast_float library version. Special thanks to Daniel Lemire for quickly accomodating a parsing use case we had encode as a test, namely with various whitespace codes. The default in fast_float, as in C++17, is to be more narrow but we enable the wider use case via two #define statements.

Changes in version 0.0.4 (2023-01-20)
  • Update to fast_float 3.9.0
  • Set two #define re-establish prior behaviour with respect to whitespace removal prior to parsing for as.double2()
  • Small update to continuous integration actions

Courtesy of my CRANberries, there is also a diffstat report for this release. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

16 January 2023

Dirk Eddelbuettel: RcppArmadillo 0.11.4.3.1 on CRAN: Updates

armadillo image Armadillo is a powerful and expressive C++ template library for linear algebra and scientific computing. It aims towards a good balance between speed and ease of use, has a syntax deliberately close to Matlab, and is useful for algorithm development directly in C++, or quick conversion of research code into production environments. RcppArmadillo integrates this library with the R environment and language and is widely used by (currently) 1034 packages other packages on CRAN, downloaded 27.6 million times (per the partial logs from the cloud mirrors of CRAN), and the CSDA paper (preprint / vignette) by Conrad and myself has been cited 509 times according to Google Scholar. This release brings another upstream bugfix interation 11.4.3, released in accordance with the aimed-for monthly release cadence. We had hoped to move away from suppressing deprecation warnings in this release, and had prepared over two dozen patch sets all well as pull requests as documented in issue #391. However, it turns out that we both missed with one or two needed set of changes as well as two other sets of changes triggering deprecation warnings. So we expanded issue #391, and added issue #402 and prepared another eleven pull requests and patches today. With that we can hopefully remove the suppression of these warnings by an expected late of late April. The full set of changes (since the last CRAN release 0.11.4.2.1) follows.

Changes in RcppArmadillo version 0.11.4.3.1 (2023-01-14)
  • The #define ARMA_IGNORE_DEPRECATED_MARKER remains active to suppress the (upstream) deprecation warnings, see #391 and #402 for details.

Changes in RcppArmadillo version 0.11.4.3.0 (2022-12-28) (GitHub Only)
  • Upgraded to Armadillo release 11.4.3 (Ship of Theseus)
    • fix corner case in pinv() when processing symmetric matrices
  • Protect the undefine of NDEBUG behind additional opt-in define

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

13 January 2023

Dirk Eddelbuettel: RcppGSL 0.3.13 on CRAN: Mandated Update

A new release 0.3.13 of RcppGSL is now on CRAN. The RcppGSL package provides an interface from R to the GNU GSL by relying on the Rcpp package. This release contains one change (made at the request of a CRAN email in light of possible future changes for C standard C17 and then C23) and removes a compiler-check from configure.ac. It is both a fair point as our src/Makevars does not actually set a compiler yet also a little marginal? The NEWS entry follows:

Changes in version 0.3.13 (2023-01-12)
  • Remove 'AC_PROG_CC' from 'configure.ac' per CRAN wish

Courtesy of CRANberries, a summary of changes in the most recent release is also available. More information is on the RcppGSL page. Questions, comments etc should go to the issue tickets at the GitHub repo. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Next.

Previous.